Bioconductor (BioC) is an open source, community driven software project which provides a framework of tools and databases for the analysis of biological data in R.
To provide -
Current release is Bioconductor 3.6
Includes 2712 packages.
All packages have been - Reviewed. - Tested and evaluated automatically. - Actively maintained and updated.
Before being accepted to Bioconductor all new packages are reviewed so as to pass Bioconductor guidelines.
Review includes automatic testing of packages
As well as an open review on Bioconductor github site.
Review ensures
Installing a Bioconductor package is quite straight forward. Every Bioconductor package has a description of the installation R command we can simply copy and paste.
Here we use source function to load a script containing functions for Bioconductor library installation. We then use the newly acquired biocLite to install the library of choice in a manner similar to install.packages
source("https://bioconductor.org/biocLite.R")
biocLite("basecallQC")All dependencies and their required versions are resolved for us. We must be careful however to check the version of Bioconductor we are using.
biocVersion()## [1] '3.5'
If we wish to update to latest Bioconductor release we can use the biocUpgrade function.
.pull-left[
All packages will have a reference manual containing the help pages for every function.
This will include importantly
]
.pull-right[]
.pull-left[
All packages will also include at least one vignette.
These vignettes detail a typical usage of the package with working examples included.
]
.pull-right[
]
Bioconductor packages cover a wide range of biological data types.
In this course we are focusing on high throughput sequencing so we will focus on the main packages for this.
This includes methods for handling common genomics data types.
Genomic sequences stored as FASTA files are handled using the Biostrings package.
Genomic intervals stored as BED files are handled using the rtracklayer and GenomicRanges packages.
Genomic scores stored as wig or bigWig files are handled using the rtracklayer and GenomicRanges packages.
As well as software packages, we know Bioconductor maintains a number of annotation packages.
This includes microarray annotation, gene to ID mappings, genes’ functional annotation, genome sequence information and gene/trancript models.
Information on model organism’s gene annotation is contained with the org.db packages.
Format is org. species . ID type .db
Homo Sapiens annotation with Entrez Gene IDs – org.Hs.eg.db
Genomic sequence information is held within the BSgenome packages.
Format is BSgenome. species. source. major version
Homo Sapiens genome sequence from UCSC’s version hg19 – BSgenome.Hsapiens.UCSC.hg19
Gene models are held in the TxDb packages.
Format is TxDb. species . source . major version . table
Homo Sapiens gene build from UCSC’s version hg19 known gene table – TxDb.Hsapiens.UCSC.hg19.knownGene